Smoothed N-best-based speaker adaptation for speech recognition

نویسندگان

  • Tomoko Matsui
  • Tatsuo Matsuoka
  • Sadaoki Furui
چکیده

Smoothed estimation and utterance veri cation are introduced into the N-best-based speaker adaptation method. That method is e ective even for speakers whose decodings using speaker-independent (SI) models are error-prone, that is, for speakers for whom adaptation techniques are truly needed. The smoothed estimation improves the performance for such speakers, and the utterance veri cation reduces the required amount of calculation. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed a reduction of 36.4% in the error rates for speakers whose decodings using SI models are error-prone. To try and nd an e ective model-transformation for speaker adaptation, we discuss replacing mixture-mean bias estimation by the widely used mixture-mean linear-regression-matrix estimation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Improving pronunciation modeling for non-native speech recognition

In this paper, three different approaches to pronunciation modeling are investigated. Two existing pronunciation modeling approaches, namely the pronunciation dictionary and n-best rescoring approach are modified to work with little amount of non-native speech. We also propose a speaker clustering approach, which capable of grouping the speakers based on their pronunciation habits. Given some s...

متن کامل

N-best-based instantaneous speaker adaptation method for speech recognition

An instantaneous speaker adaptation method is proposed that uses N-best decoding for continuous mixture-density hidden-Markov-model based speech recognition systems. An N-best paradigm of multiple-pass search strategies is used that makes this method e ective even for speakers whose decodings using speaker-independent models are error-prone. To cope with an insu cient amount of data, our method...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997